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Abstract — The existing upper and lower bounds between 
entropy and error probability are mostly derived from the 
inequality of the entropy relations, which could introduce approx- 
imations into the analysis. We derive analytical bounds based on 
the closed-form solutions of conditional entropy without involving 
any approximation. Two basic types of classification errors are 
investigated in the context of binary classification problems, 
namely, Bayesian and non-Bayesian errors. We theoretically 
confirm that Fano's lower bound is an exact lower bound for any 
types of classifier in a relation diagram of "error probability vs. 
conditional entropy". The analytical upper bounds are achieved 
with respect to the minimum prior probability, which are tighter 
than Kovalevskij's upper bound. 

Index Terms — Entropy, error probability, Bayesian errors, 
analytical, upper bound, lower bound 

I. Introduction 

In information theory, the relations between entropy and er- 
ror probability are one of the important fundamentals. Among 
the related studies, one milestone is Fano's inequality (also 
known as Fano's lower bound on the error probability of 
decoders), which was originally proposed in 1952 by Fano, but 
formally published in 1961 |]T). It is well known that Fano's 
inequality plays a critical role in deriving other theorems and 
criteria in information theory [2|[3|[4|. However, within the 
research community, it has not been widely accepted exactly 
who was first to develop the upper bound on the error proba- 
bility According to J6] 0, Kovalevskij (H was possibly 
the first to derive the upper bound of the error probability in 
relation to entropy in 1965. Later, several researchers, such as 
Chu and Chueh in 1966 (9), Tebbe and Dwyer III in 1968 HI, 
Hellman and Raviv in 1970 ifTTIl . independently developed 
upper bounds. 

The upper and lower bounds of error probability have been a 
long-standing topic in studies on information theory lfl2l lfl3l 
Q3(BlQlD1109l(20l(6lCl. However, we consider two 
issues that have received less attention in these studies: 

I. What are the "analytical bounds" for which approxima- 
tions have not been applied in the derivation? 

II. What is the interpretation of each bound or some key 
points in a given diagram of entropy and error probability? 

On the first issue, we define "analytical bounds" to be those 
derived from closed-form solutions, rather than from inequal- 
ity approximations. Generally, exact bounds are desirable from 
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the viewpoint of theory and applications. The second issue 
suggests the need for a better understanding of the bounds 
in the relation of entropy and error probability. For example, 
some key points located at the bounds could show the specific 
interpretations for theoretical insights or application meanings. 

The above issues forms the motivation behind this work. 
We establish analytical bounds based on closed-form solutions. 
Furthermore, we study the bounds in a wider range of error 
type, i.e., Bayesian and non-Bayesian. Non-Bayesian errors are 
also of importance because most classifications are realized 
within this category. We take classifications as a problem 
background since it is more common and understandable 
from our daily-life experiences. We intend to simplify settings 
within binary states and Shannon entropy definitions so that 
the analytical-principle approach is highlighted. Based on this 
principle, one is able to extend the study to more general clas- 
sification settings, such as multiple-class (or multihypothesis) 
problems, and on other definitions of entropy, such as Renyi 
entropy. 

The rest of this paper is organized as follows. In Section 
II, we present related works on the bounds. For a problem 
background of classifications, several related definitions are 
given in Section III. The analytical bounds are given and 
discussed for Bayesian and non-Bayesian errors in Sections 
IV and V, respectively. Interpretations to some key points are 
presented in Section VI. Finally, in Section VII we conclude 
the work and present some discussions. 



II. Related Works 

Two important bounds are introduced first, which form the 
baselines for the comparisons with the analytical bounds. They 
were both derived from inequality conditions! 1] [8] . Suppose 
the random variables X and Y representing input and output 
messages (out of m possible messages), and the conditional 
entropy H(X\Y) representing the average amount of infor- 
mation lost on X when given Y. Fano's lower bound |T) is 
given in a form of: 



H(X\Y) < H(e) + P e log 2 (m - 1) 



(1) 



where P e is the error probability, and H (e) is the associated 
binary Shannon entropy defined by ||2TI : 

H(e) = -P e log 2 P e - (1 - P e )log 2 (l - P e ). (2) 

The base of the logarithm is 2 so that the units are "bits". 
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The upper bound is given by Kovalevskij [8) in a piecewise 
linear form: 

H(X\Y) > log 2 k + k(k + l)(log 2 ^)(P e - 

and k < m, m > 2. 

(3) 

For a binary classification (m = 2), Fano-Kovalevskij 
bounds become: 



H- 1 {e)<P e < 



H(X\Y) 



(4) 



where H~ 1 (e) is an inverse of H(e). 

Several different bound diagrams between error probability 
and entropy have been reported in literature. The initial differ- 
ence is made from the entropy definitions, such as Shannon 
entropy in 02) Gil E2 ED, and Renyi entropy in iTBl ll6l iTTll . 
The second difference is the selection of bound relations, 
such as "P e vs. H(X\Y)" in Q2), "H(X\Y) vs. P e " in E) 
lfT31ll6l Q, "P e vs. MI(X,Y)" in JSJ, and "NMI(X,Y) 
vs. A" in ll22ll . where A is the accuracy rate, MI(X, Y) 
and NMI(X, Y) are the mutual information and normalized 
mutual information, respectively, between variables X and 
Y . Wang and Hu [22] was the first to derive the analytical 
relations of mutual information with respect to accuracy, 
precision, and recall, and their analytical bounds. However, 
they did not consider the Bayesian error constraint. When the 
Bayesian error constraint was added into the bound relation 
in lF23l . the upper bound from [8| is not analytical one. 
Because the existing bounds are derived from inequality with 
approximations, some investigations [17| [18| [20 1 have been 
reported on the improvement of bound tightness. 

III. Related Definitions 

Binary classifications are considered in this work. A the- 
oretical derivation of relations between entropy and error 
probability, is achieved based on the joint probability p(t, y) 
in classifications, where t E T = {£1,^2} is the true (or 
target) state within two classes, and y G Y = {2/1,2/2} is 
the classification output. The simplified notations for = 
p(t,y) — p(t — U,y — 2/j) are used in this work. Several 
definitions are given below. 

Definition 1 (Joint probability in binary classifications): In 
a context of binary classifications, the joint probability p(t, y) 
is defined in a generic setting as: 



P11 
P21 



Pi - ex, P12 
e2, P22 



: ei, 

P2 -e 2 , 



(5) 



where p\ and p 2 are the prior probabilities of Class 1 and 
Class 2, respectively; their associated error probabilities are 
denoted by e\ and e%, respectively. For the Bayesian decision, 
P\ and P2 are always known. The constraints in eq. (5) are 
given: 



<p x < 1, <p 2 < 1, pi 
< ei < pi, < e 2 < p 2 . 



- P2 



= 1 



(6) 



Definition 2 (Bayesian error and non-Bay esian error): 
"Bayesian error" is defined to be the theoretically lowest error 
in classifications l25l . and denoted by P e . Hence, the other 



errors are "non-Bay esian errors", and denoted by Pe{> P e 
for the same probability distributions). 

Definition 3 (Error probability calculation): In binary clas- 
sifications, error probabilities are calculated from the same 
formula: 



e(P e , or Pe) = p 12 +P21, 



(7) 



where e is also denoted an error variable with no distinction 
between error types. 

Definition 4 (Minimum and maximum error bounds in 
classifications): Classifications suggest the minimum error 
bound as: 

(PE)min = (Pe)min = 0, (8) 

where the subscript "min" denotes the minimum value. The 
maximum error bound for Bayesian error in binary classifica- 
tions is 



(Pe)r 



Pr, 



min{pi,p 2 }, 



(9) 



where the symbol "min" denotes a "minimum" operation. For 
non-Bayesian error, its maximum error bound becomes 



(P E )r 



1. 



(10) 



Definition 5 (Admissible area, point and their properties 
in a diagram of entropy and error probability): In a given 
diagram of entropy and error probability, we define the area 
enclosed by the bounds to be "admissible area", if every point 
inside the area can be possibly realized from classifications, 
we call those points to be "admissible points". If a point 
is unable to be realized from classifications, it is a "non- 
admissible point". A non-admissible point can only be located 
at or outside the boundary of the admissible area. If every point 
located on the boundary is admissible, we call this admissible 
area "closed". If one or more points at the boundary are non 
admissible, the area is said "open". 

IV. Analytical upper and lower bounds for 
Bayesian errors 

All analytical bounds are derived from a closed-form re- 
lation of conditional entropy and error probabilities (see Ap- 
pendix A). The analytical lower bound for Bayesian errors 
is: 

P e > max{0,Gi(P(P|r))}, (11) 

where #(T|y) is the conditional entropy for the random 
variables T and Y, and G\ is called the "analytical lower 
bound function" (or "analytical lower bound" for short) and 
satisfies the following relations with respect to the error 
variable e: 

e = G 1 {H{T\Y))=H- 1 {e), and 
H(T\Y) = Gi 1 {e)=H(e) (12) 
= -P e log 2 P e - (1 - P e )log 2 (l - P e ). 



The analytical upper bound is given by: 

P e <min{ Pm in,G 2 {H(T\Y))}, 



(13) 
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Fig. 1. Plot of "P e vs. H(T\Y)" giving the analytical upper bounds, 
Kovalevskij's upper bound and Fano's lower bound. 



where G2 is the "analytical upper bound function" and for 
which the following relation holds: 



H(T\Y) = G9 

= l>min" 



2 1 (e) 
log 2 ^~ 



e+p„ 



elogi 



(14) 



e+p„ 



In eq. (14) p m in is known, because for Bayesian classifications 
Pi and pi are given information. 

Fig. 1 depicts three analytical upper bounds together with 
Fano's lower bound and Kovalevskij's upper bound in the 
graph of "P e vs. H(T\Y)". Several findings can be observed 
from the novel upper bounds. 

I. If pi ^ P2, the analytical upper bounds are formed by 
one curve and one line. These are lower than Kovalevskij's 
upper bound except for two specific points: the original point, 
O, and one corner point, B or C, in Fig. 1 . 

II. If pi = P2, the analytical upper bound becomes a single 
curve, which is also lower than Kovalevskij's upper bound, 
except at the two end-points, points O and A. 

III. The analytical upper bounds, either curved or linear, are 
controlled by p m in- 

IV. The admissible area in Bayesian decision is closed. Its 
shape changes depending on the value of p m in- For example, 
the area enclosed by the two-curve-one-line boundary, "O — 
C — C — O" in Fig. 1, corresponds to classifications with 
Pmin = 0.2. The line boundary shows the maximum error for 
Bayesian decisions, (P e )max = Pmin, in binary classifications 



Interpretations are given below to the analytical bounds in 
the context of binary classifications. Similar discussions on 
some specific points are gvien in Section VI. 

Fano's lower bound: In (2), a marginal probability distri- 
bution is applies for explaining the equality of Fano's lower 
bound (see eq. (2-144), 10): 



Because we derive the bound based on joint probability 
distributions in (5), novel explanations can be obtained. A 
generic classification setting can represent this bound: 



ei 



Pl(p2 — £2) 
P2 



or e 2 



P2(pi-ei) 
Pi : 



(16) 



The setting above is derived based on the minimum relations 
(or Property 7 in 11261 ). Eq. (16) describes an extremal prop- 
erty in the relations of entropy and error probability, but is 
expressed between the error probabilities. 

Based on eq. (16), a specific classification setting can be 
obtained, in which one is to classify a minority class (say, 
Class 2) into a majority class (Class 1): 



P11 = Pi, P12 = 0, 
P21 = Pi = e, P22 = 0. 



(17) 



Eq. (17) will result in a zero value for the mutual information, 
which implies "no correlation" [25] between two variables 
T and Y, or "zero information" [27| from the classification 
decisions. It also indicates the "statistically independent" Q 
between two variables. In 11231 . Hu demonstrated that Bayesian 
classifiers will obtain such solutions for p\ > p 2 when 
processing extremely-skewed classes with no cost terms given. 
One can also observe that eq. (17) is equivalent to (15) when 
m = 2. 

Analytical upper bound: Supposing p\ > P2, a specific 
classification setting can be obtained for representing this 
bound: 

P11 =Pi- ei, P12 = ei = e, 
P21 = 0, P22 = P2- 



(18) 



Eq. (18) suggests the generic conditions, e.j = e, if pi > pj, 
and i ^ j, i,j = 1,2, for another extremal property in the 
relations of entropy and error probability. Hence, the analytical 
upper bound function corresponds to a zero value for the 
conditional probability, or the maximum value for the mutual 
information. 

V. Analytical upper and lower bounds for 
non-Bayesian errors 

In a context of classification problems, Bayesian errors can 
be realized only if one has exact information about all prob- 
ability distributions [25 1 . The assumption above is generally 
impossible in real applications. Therefore, the analysis of non- 
Bayesian errors also presents significant interests in studies. 

The Fano's lower bound will be effective for all classifi- 
cations. The bound is general and independent of error type 
and information about p\ and p 2 . If no information is given 
about pi and P2, we obtain a "general upper bound" for non- 
Bayesian errors in the form: 



Pe < 1 - H-\e) = 1 - Gi(H(T\Y)), 



(19) 



which is a mirror of Fano's lower bound with mirror axis 
along Pe = 0.5. If one has information about p\ and p 2 , the 
analytical upper bound of Pe is 



p(y) = ( 1 -- p e,^ T ,-,^ T )- 



(15) 



Pe < G 2 {H{T\Y)), 
for H(T\Y) < H{T\Y) max and P E G [0,0.5], 



(20) 
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where H(T\Y), 
calculated from: 



is the "upper bound of H(T\Y)" and 



H(T\Y) T 



H(e = p mm ). 



(21) 



The analytical upper bound described in (14) also forms a 
"mirrored analytical upper bound", which will be effective 
for P E e [0.5,1.0]. 

From the graph of "Pe vs. H(T\Y)" (Fig. 2), observations 
for non-Bayesian errors can also be summarized as follows: 

I. In general, if no information exists about p% and pi, the 
admissible area is formed by Fano's lower bound, its mirrored 
bound, and the axis of Pe, that is, the two-curve-one-line 
boundary "O — A — D — O" in Fig. 2. This area covers 
all other admissible areas formed from analytical bounds for 
which information about p\ and p2 is applied. 

II. If pi and p2 are known, the admissible area will be 
formed from the analytical upper bound, its mirrored bound, 
and the upper bound H(T\Y) max . The area is controlled by 
Pmin- For example, if p m in = 0.2, the area is enclosed by the 
four-curve-one-line boundary "O — F' — F — D — A' — O" 
in Fig. 2. However, if p\ = p^ — 0.5, two admissible areas 
are specifically formed. Their two-curve boundaries are "O — 
F' — A - O" and "D — F — A — D", respectively. 

III. All admissible areas, whether with or without informa- 
tion of pi and p2, are closed. The areas are formed differently 
with respect to the given information. The more information 
available, the tighter the bounds become, or the smaller the 
admissible areas become. In general, non-Bayesian error Pe 
can be higher than Kovalevskij's bound. 

General upper bound of non-Bayesian errors: For non- 
Bayesian classifications, eq. (5) with condition Pe = ei+e2 > 
0.5 describes a general classification setting to represent the 
general upper bound. Two specific settings can be obtained 
for demonstrations. One setting is described by eq. (17) with 
Pi < P2- The other setting is 



pu = 0.5 - P E /2, p u = P E /2, 

Pu = Pe/2, P 22 = 0.5 - P E /2. 



(22) 



Mirrored analytical upper bound: A mirrored analytical up- 
per bound is formed for non-Bayesian error with the condition 
that pi and p2 are known. This bound in fact serves as a lower 
bound for Pe G [0.5, 1.0]. Suppose p\ > p2, a specific setting 
in classifications can be found for representing the mirrored 
bound: 



Pu 

P21 



Pi ~ 
P2, 



ei, P12 

P22 



ei 
0. 



e > 0.5, 



(23) 



VI. Interpretations to some key points 

Further interpretations are given to the key points shown in 
Fig. 1 and Fig. 2. Those key points may hold special features 
in classifications. 

Point O: This point represents a zero value of H(T\Y). 
It also suggests a "perfect classification" without any error 
(P e = Pe = 0) by a specific setting of the joint probability: 



Pu 
P21 



Pi, P12 
0, p 2 2 



= 0, 

P2- 



(24) 




Fano's 
lower bound 



Analytical 
upper bound 
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Q 
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Fig. 2. Plot of "Pe vs. H(T\Y)" giving the analytical bounds and the 
mirrored bounds. 



Point A: This point represents maximum ranges of 
H(T\Y) — 1 for "class-balanced" classifications (pi = P2). 
Three specific classification settings can be obtained for 
representing this point. The two settings are actually "no 

classification": 



pu = 1/2, p 12 = 0, or pu = 0, p 12 = 1/2, 
P21 = 1/2, P22 = 0, P21 = 0, P22 = 1/2. 

The other one is a "random guessing": 

pu = 1/4, pu = 1/4, 
P21 = 1/4, P22 = 1/4. 



(25) 



(26) 



Point D: This point occurs for non-Bayesian classifications 
in a form of: 

PU = 0, P\2=PU 
P21 = P2, P22 = 0. 



(27) 



In this case, one can exchange the labels for a perfect classi- 
fication. 

Points B (or C) and B' (or C): Suppose p\ > P2- The 
specific setting is: 



Pu = Pi ~P2, P12 = P2, 
P21 = 0, P22 = P2, 



(28) 



for Point B when P2 — 0.4 (or Point C when P2 = 0.2), and 
two specific settings for Point B' (or Point C") are: 



pu = Pi, P12 = 0, 

P21 = P2, P22 = 0, 

Pu = 0.5 - P2/2, P12 = P2/2, 
P2i=P2/2, P22 — 0.5 -P2/2. 



(29) 
(30) 



Points E (or F) and E' (or F'): Suppose p\ > P2- The 
specific setting is: 



Pu = 0, P12 = Pi, 
P21 = 0, P22 = P2, 



(31) 
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for Point E when p2 — 0.3 (or Point F when p2 = 0.1), and 
eq. (30) for Point E 1 (or F') on the given value of p2- 

Point A': Suppose p\ > 0.5. The specific setting for Point 
A 1 is: 

Pn =pi- 0.5, pi2 = 0.5, 
P21 = 0, P22 =P2- 



(32) 



Points Q and R: The two points are specific due to their 
positions in the diagrams. For both types of errors, they are 
all considered to be "non-admissible points" in the diagrams, 
because no setting exists in binary classifications which can 
represent the points. 



VII. Final remarks 

This work investigates into analytical bounds between en- 
tropy and error probability. Two specific schemes are applied 
in the theoretical derivation. One scheme is the utilization of 
joint probability distributions, on which more general inter- 
pretations can be obtained for understanding the bounds. The 
other scheme is the closed-form solution of the maximization 
or minimization to the related functions. We derived the 
analytical bounds for both types of Bayesian errors and non- 
Bayesian ones. While a new interpretation is given to Fano's 
lower bound, the analytical upper bounds are achieved which 
show tighter than Kovalevskij's upper bound. 

To emphasize the importance of the study, we present 
discussions below on the selection of learning targets between 
error and entropy from the perspective of machine learning. 
The analytical bounds derived in this work provide a novel 
solution to link both learning targets in the related studies. 
Error-based learning is more conventional because of its 
compatibility with our intuitions in daily life, such as "trial 
and error'. Significant studies have been reported under this 
category. In comparison, information-based learning ll28l is 
new and uncommon in applications, such as classifications. 
Entropy is not a well-accepted concept related to our intuition 
in decision making. This is one of the reasons why the 
learning target is chosen mainly based on error, rather than 
on entropy. However, we consider that error is an empirical 
concept, whereas entropy is generally more theoretical. In [29], 
we demonstrated that entropy can deal with both concepts of 
"error" and "reject" in abstaining classifications. Information- 
based learning [28 1 presents a promising and wider perspective 
for exploring and interpreting learning mechanisms. 

When considering all sides of the issues stemming from 
machine learning studies, we believe that "what to learn" is a 
primary problem. However, it seems that more investigation 
is focused on the issue of "how to learn". Moreover, in 
comparison with the long-standing yet hot theme of "feature 
selection", little study has been done from the perspective of 
"learning target selection". We propose that this theme should 
be emphasized in the study of machine learning. Hence, the 
relations studied in this work are very important and crucial to 
the extent that researchers, using either error-based or entropy- 
based approaches, are able to reach a better understanding 
about its counterpart. 



Appendix A 
Proofs of the analytical bounds 

For a binary classification, a closed-form relation of condi- 
tional entropy and error probabilities is derived from the j oint 
probability (5): 



H(T\Y) 



H(T) — MI(T, Y) 
-p\log 2 p\ -p2log 2 p2 
—e\loq2i — r-^ 1 — \ — 

e 2^°<72 ( pi _ ei+e2 )p 2 

(P2-e 2 ) 



(Al) 



-(p2-e 2 )lo 9 2 (p2+ei - e2)p2 - 

Based on eq. (Al), the analytical functions of lower bound 
and upper bound should be derived from the following defi- 
nitions, respectively: 

Gi x (e,p min ) = argm&xH(T\Y). (A2) 

Gz x {e,p min ) = arg nun H(T\Y). (A3) 

The meanings of lower and upper are exchanged in (A2) 
and (A3) respectively, because the input variable is e in the 
derivations. A single independent parameter is given to p m in, 
which is assumed to be known in the derivations. 

However, in the background of binary classifications, the 
function H(T\Y) in (Al) has two independent variables, e± 
and 62- This feature causes a difficulty in the direct derivation 
of (A2) or (A3) based on a single variable function e. The dif- 
ficulty is the multiple solutions of e\ and e2 to the same bound, 
which makes the derivation to be tedious. For overcoming this 
difficulty, we adopt Maple™9.5 (a registered trademark of 
Waterloo Maple, Inc.) for implementing the derivations. Using 
the Maple code shown in Appendix B, one is able to confirm 
the derivations easily for the multiple-to-one relations of the 
bound. 

Proof: On the analytical lower bound function 

j Pmin j • 

From information theory [2 |, one can have the following 
conditions for mutual information: 



< MI(T, Y) < H(T) = H(e). 



(A4) 



Hence, eq. (Al) describes that, when MI(T,Y) — 0, one 
can have the maximum results of H(T\Y). We can show that 
the generic classification setting in eq. (16) will result in the 
condition of MI(T, Y) = 0. Using the Maple code, one can 
substitute either condition from (16) into (Al), and always 
arrive at the same results on MI(T, Y) — and the analytical 
lower bound function in terms of e and p m in- ■ 
Proof: On the analytical upper bound function 

) Pmin ) • 

Eq. (Al) suggests that the maximum solution of MI(T, Y) 
should be equivalent. For achieving a single-variable function 
in (A3), we need to solve the following problem first: 



arg max MI(T, Y), 

given &i 



{Ah) 



where Ml is described implicitly by two independent vari- 
ables e and C2- Due to high complexity of the nonlinearity in 
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MI, we are unable to obtain the direct relation between e and 
e 2 . Therefore, we solve the problem of (A5) by examining the 
differential function of MI(T, Y) with respect to e: 

fM/(T,y) = H((^ I igL), (A6) 

where we consider e 2 and p 2 as the constants. Suppose the 
condition 1 > p\ > p 2 > e > e 2 > 0, one can prove that (A6) 
is always negative and without singularity. Hence, MI(T,Y) 
is a monotonously decreasing function with respect to e for 
the given condition. The maximum MI(T, Y) will require the 
smallest e. From e = e\ + e 2 and the given e 2 in (A5), one 
can derive the solutions e = e\ and e 2 = 0. The specific 
classification setting associated to the solutions is shown in 
(18). For the other conditions with the same value of e, one can 
always obtain the same value on the maximum of MI(T, Y). 
The analytical upper bound function will be always the same 
in terms of e and p m in- ■ 
The proof of mirrored bounds can be obtained directly in 
the similar principle, and is neglected here. 
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Appendix B 
Maple code for the derivations 

> # Maple code for deriving the analytical lower bound 

> restart; # Clean the memory 

> # Shannon entropy 

> HT:=-pl*log [2] (pi) -p2*log[2] (p2); 

> # Terms of joint probability 

> pll:=(pl-el) ; pl2 : =el ; p22 : =p2-e2 ; p2 1 : =e2 ; 

> # For the generic setting in (16) 

> el : =pl * (p2-e2 ) /p2 ; pi : =l-p2 ; 

> # Intermediate variables 

> ql : =pll+p21; q2 : =pl2+p22; 

> # Mutual information 

> MI:=pll*log[2] (pll/ql/pl ) +pl2*log [2 ] (pl2/q2/pl); 

> MI : =MI+p22 * log [ 2 ] (p22/q2/ ( 1-pl ) ) +p21*log [2 ] (p21/ql/ (1-pl) ) ; 

> MI : =simplif y (MI , In) ; # Solution of mutual information 

MI := 

> # The analytical lower bound function 

> HTY : =simplif y ( (HT-MI ) , In) ; 

> # Display of the lower bound function in terms of e and p2 

(1 - p2) ln(l - p2) p2 ln(p2) 

HTY := 

ln(2) ln(2) 

> # Maple code for deriving the analytical upper bound 

> restart; # Clean the memory 

> # Shannon entropy 

> HT:=-pl*log [2] (pi) -p2*log[2] (p2); 

> # Terms of joint probability 

> pll:=(pl-el) ; pl2 : =el ; p22 : =p2-e2 ; p2 1 : =e2 ; 

> # For error variable 

> el : =e-e2 ; pi : =l-p2 ; 

> # Intermediate variables 

> ql : =pll+p21; q2 :=pl2+p22; 

> # Mutual information 

> MI :=pll*log [2] (pll/ql/pl) +pl2*log[2] (pl2/q2/pl); 

> MI : =MI+p22 * log [ 2 ] (p22/q2/ ( 1-pl ) ) +p21*log [2 ] (p21/ql/ (1-pl) ) ; 

> MI_dif : =simplify ( combine (diff (MI , e ), In, symbolic)); 

> # Display of diffential function of MI in (A6) 

/ (-1 + p2 + e - 2 e2) (e - e2) \ 

In | | 

\(-l + p2 + e - e2) (e - 2 e2 + p2)/ 

MI_dif := 

ln(2) 

> # For the generic setting in (18) 

> el : =e; e2 : =0 ; pi : =l-p2 ; 

> # Intermediate variables 

> ql : =pl l+p2 1 ; q2 : =pl2+p22 ; 

> # Mutual information 

> MI :=pll*log [2] (pll/ql/pl) +pl2*log[2] (pl2/q2/pl); 

> # Neglect one term below from the entropy definition of 0*log(0)=0 

> MI : =MI+p22 * log [ 2 ] (p22/q2/ ( 1-pl ) ) ; 

> # The analytical upper bound function 

> HTY : =combine (simplify (combine (simplify (HT-MI ) , In, symbolic) ) ) ; 

> # Display of the upper bound function in terms of e and p2 

/e + p2\ /e + p2\ 

p2 In | | + e In | | 

\ P 2 / \ e / 

HTY : = 

ln(2) 



